Encoded speech recognition accuracy improvement in adverse environments by enhancing formant spectral bands

نویسندگان

  • Shubha Kadambe
  • Ron Burns
چکیده

Spoken dialogue information retrieval applications are the future trend for mobile users in automobiles, on cellular phones, etc. Due to the limitation of resources in these platforms, it may be advantageous to extract speech features, and compress and transmit them to a central hub where the computation intensive tasks such as speech recognition and speech understanding, etc. can be performed. Generally, the speech recognition accuracy degrades when the decoded speech signal (that is obtained after re-synthesizing the signal from the compressed features) is used. In addition, the background noise that is present in the above mentioned mobile systems will reduce the recognition accuracy. Therefore, in order to improve the recognition accuracy it is essential to extract robust features that can jointly optimize compression and recognition. In this paper, we describe a technique that improves the recognition accuracy of noisy encoded speech signals by performing spectral correction and spectral formant band enhancement before synthesizing the speech signal from the compressed features. We have conducted experiments on 1831 telephone speech utterances from 1831 speakers. We added (a) the invehicle noise recorded from a Volvo car moving on an asphalt road at 134 kmph, (b) the factory noise recorded in a factory and (c) the speech (babble) noise recorded in a cafeteria to these utterances at various signal-to-noise ratios (SNR). Our experimental results indicate recognition accuracy improvement up to 10% at 0 dB SNR.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language

Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...

متن کامل

Reliable bands guided similarity measure for noise-robust speech recognition

Under noisy conditions, due to the redundancy of speech signal, there are some spectral bands (Reliable Bands) whose local SNR’s are high enough to be used effectively by a recognizer. A novel, phonetically motivated Reliable Bands Guided similarity measure (RBG measure) is proposed in this study. It has the following features. Firstly, for reference spectrum, frequency bands which have larger ...

متن کامل

Auditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments

In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition ...

متن کامل

Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments

In this paper, we propose a feature enhancement algorithm for wireless speech recognition in adverse acoustic environments. A speech recognition system is realized at the network side of a wireless communications system and feature parameters are extracted directly from the bitstream of the speech coder employed in the system, where the feature parameters are composed of spectral envelope infor...

متن کامل

Formant position based weighted spectral features for emotion recognition

In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the nor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000